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FASTER DISCRETE COSINE TRANSFORMS USING SCALED TERMS 

CROSS-REFERENCE TO RELATED APPLICATION 
This application is related to the following co-pending and commonly- 
5 assigned patent applications, which are hereby incorporated herein by reference in 
their respective entirety: 

"FASTER TRANSFORMS USING SCALED TERMS" to Trelewicz et al., 
having attorney docket no. BLD9-2000-0059US1. 

"FASTER TRANSFORMS USING EARLY ABORTS AND PRECISION 
10 REFINEMENTS" to Mitchell et al., having attorney docket no. BLD9-2000-0064US1 . 



BACKGROUND OF THE INVENTION 

15 1 . Field of the Invention . 

This invention relates in general to data processing, and more particularly to 
faster discrete cosine transforms using scaled terms. 

2. Description of Related Art . 
20 Transforms, which take data from one domain (e.g., sampled data) to another 

(e.g., frequency space), are used in many signal and/or image processing 
applications. Such transforms are used for a variety of applications, including, but 
not limited to data analysis, feature identification and/or extraction, signal correlation, 
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data compression, or data embedding. Many of these transforms require efficient 
implementation for real-time and/or fast execution. 

Data compression is desirable in many data handling processes, where too 
much data is present for practical applications using the data. Commonly, 
5 compression is used in communication links, to reduce transmission time or required 
bandwidth. Similarly, compression is preferred in data storage systems, including 
digital printers and copiers, where "pages" of a document to be printed may be 
stored temporarily in memory. Here the amount of media space can be substantially 
reduced with compression. Generally speaking, scanned images, i.e., electronic 

10 representations of hard copy documents, are often large, and thus make desirable 
candidates for compression. 

It is well-known in the art to use a discrete cosine transform for data 
compression. In particular examples referred to herein, the terms images and image 
processing will be used. However, those skilled in the art will recognize that the 

15 present invention is not meant to be limited to processing images but is applicable to 
processing different data, such as audio data, scientific data, image data, etc. 

In combination with other techniques, such as color subsampling, 
quantization, Huffman coding and run-length coding, the discrete cosine transform 
can compress a digital color image by a factor of approximately thirty to one with 

20 virtually no noticeable image degradation. Because of its usefulness in data 
compression, the discrete cosine transform is an integral part of several data 
compression standards used by international standards committees such as the 
International Standards Organization. 
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DCT (Discrete Cosine Transform), disseminated by the JPEG committee, is a 
lossy compression system which reduces data redundancies based on pixel to pixel 
correlations. Generally, an image does not change very much on a pixel to pixel 
basis and therefore has what is known as "natural spatial correlation". In natural 
5 scenes, correlation is generalized, but not exact. Noise makes each pixel somewhat 
different from its neighbors. Moreover, signal and data processing frequently needs 
to convert the input data into transform coefficients for the purposes of analysis. 
Often only a quantized version of the coefficients are needed (e.g., JPEG/MPEG 
data compression or audio/voice compression). Many such applications need to be 

10 done fast in real time such as the generation of JPEG data for high speed printers. 

Generally, an example of a JPEG DCT compression and decompression 
system may be had by referencing the Encyclopedia of Graphics File Formats, by J. 
D. Murray and W. vanRyper, pp. 159-171 (1994, O'Reilly & Associates, Inc.). 
Further description of the draft JPEG standard may be found, for example, in "JPEG 

15 Still Image Data Compression Standard," by W. Pennebaker and J. Mitchell, 1993 
(Van Nostrand Reinhold, New York) or "Discrete Cosine Transform: Algorithms, 
Advantages and Applications," by K. Rao and P. Yip, 1990 (Academic Press, San 
Diego). 

The two-dimensional discrete cosine transform is a pair of mathematical 
20 equations that transforms one NixN 2 array of numbers to or from another NixN 2 

array of numbers. The first array typically represents a square NxN array of spatially 
determined pixel values which form the digital image. The second array is an array 
of discrete cosine transform coefficients which represent the image in the frequency 
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domain. This method of representing the image by the coefficients of its frequency 
components is a special case of the discrete Fourier transform. The discrete Fourier 
transform is the discrete version of the classic mathematical Fourier transform 
wherein any periodic waveform may be expressed as a sum of sine and cosine 
waves of different frequencies and amplitudes. The discrete cosine transform, like 
the Fourier transform, is thus a transform which transforms a signal from the time 
domain into the frequency domain and vice versa. With an input image, A, the 
coefficients for the output "image," B, are: 

B(k 1 ,k 2 )=Y 4 J]4A(i,j)cos 

i=0 j=0 

For an image, the input is N 2 pixels wide by pixels high; A(i, j) is the 
intensity of the pixel in row i and column j; B(ki,k 2 ) is the DCT coefficient in row ki 
and column k 2 of the DCT matrix. All DCT multiplications are real. This lowers the 
number of required multiplications, as compared to the discrete Fourier transform. 
For most images, much of the signal energy lies at low frequencies; these appear in 
the upper left corner of the DCT. The lower right values represent higher 
frequencies, and are often small enough to be neglected with little visible distortion. 

There are two basic discrete cosine transform equations. The first basic 
equation is the forward discrete cosine transform which transforms the pixel values 
into the discrete cosine transform coefficients. The second basic equation is the 
inverse discrete cosine transform which transforms the discrete cosine transform 
coefficients back into pixel values. Most applications of the discrete cosine 

Page 4 

BLD9-2000-0056US1 

ALG 501 .376US01 
Patent Application 



(2z + l) 



cos 



~2N, 



-(2/ + 1) 



transform for images use eight-by-eight arrays wherein N therefore has a value of 
eight. Assuming then that N has the value of eight when performing the transforms, 
where f(i, j) are the values of the pixel array and F(u, v) are the values of the discrete 
cosine transform coefficients, the equations of the discrete cosine transforms are as 
follows. 

The formula for the 2D discrete cosine transform is given by: 



C C 



7 7 



F( M ,v) = -^££/(x,j,)cos 



ft 



16 



cos 



16 



10 where x, y = spatial coordinates in the spatial domain (0, 1 , 2 7); u, v = 

coordinates in the transform domain (0, 1,2, ....7); C u = -J= for u = 0, otherwise 1- 

V2 

and C v = -JL for v = 0, otherwise 1 . The separable nature of the 2D DCT is 

exploited by performing a 1 D DCT on the eight columns, and then a 1 D DCT on the 
eight rows of the result. Several fast algorithms are available to calculate the 8-point 
15 1 D DCT. 

As described above, a DCT compressor comprises mainly two parts. The 
first part transforms highly correlated image data into weakly correlated coefficients 
using a DCT transform and the second part performs quantization on coefficients to 
reduce the bit rate for transmission or storage. However, the computational burden 
20 in performing a DCT is demanding. For example, to process a one-dimensional 
DCT of length 8 pixels requires 13 multiplications and 29 additions in currently 
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known fast algorithms. As stated above, the image is divided into square blocks of 
size 8 by 8 pixels, 16 by 16 pixels or 32 by 32 pixels. Each block is often processed 
by the one-dimensional DCT in row-by-row fashion followed by column-by-column. 
On the other hand, different block sizes are selected for compression due to different 
types of input and different quality requirements on the decompressed data. 

In the article, "A Fast DCT-SQ Scheme for Images," Trans. IEICE, Vol. E-71, 
No. 11, pp. 1095-1097, Nov. 1988, Y. Arai, T. Agui, and M. Nakajima proposed that 
many of the DCT multiplications can be formulated as scaling multipliers to the DCT 
coefficients. The DCT after the multipliers are factored out is called the scaled DCT. 
The scaled DCT is still orthogonal but no longer normalized, whereas the scaling 
factors may be restored in a subsequent quantization process. Arai, et al. have 
demonstrated in their article that only 5 multiplications and 29 additions are required 
in processing an 8-point scaled DCT. 

However, there is a need to further increase the speed of the encoder 
because more than half of the time in the JPEG encoder is spent in the Forward 
Discrete Cosine Transform (FDCT) code calculating the two-dimensional (2-D) 8x8 
block of 8-bit or 12-bit samples. Currently, the 2-D FDCT is calculated by first 
calculating eight 1-D horizontal DCTs and then calculating another eight 1-D vertical 
DCTs using the currently fastest known 1-D DCT, which was first described by Arai, 
Agui, and Nakajima, as mentioned above. The current process takes 29 additions 
and 5 multiplication to calculate a scaled version of the 1-D FDCT. The scaling 
constants are the same for each vertical column and can finally be included in the 
quantization step. This prior solution saved eight multiplications per 1-D FDCT. 
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However, as stated above, it can be seen that there is a need to provide a faster 
DCT transform. 

It can also be seen that there is a need to provide a method and apparatus for 
performing discrete cosine transforms with less addition and multiplication steps to 
increase throughput of an encoder. 
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SUMMARY OF THE INVENTION 
To overcome the limitations in the prior art described above, and to overcome 
other limitations that will become apparent upon reading and understanding the 
present specification, the present invention discloses a faster discrete cosine 
transform that uses scaled terms. 

The present invention solves the above-described problems by decreasing 
the number of multiplications by using scaling terms on the transform constants. 
Further, after scaling, the terms of the constants are replaced by a series of fewer 
linear shifts and additions since the constants may be approximated by sums of 
powers-of-2 after only a few summations. Those skilled in the art will recognize that 
throughout this specification, the term 'matrix" is used in both its traditional 
mathematical sense and also to cover all hardware and software systems which 
when analyzed could be equivalently represented as a mathematical matrix. 

A method in accordance with the principles of the present invention includes 
arranging discrete cosine transform equations into at least one collection having at 
least two discrete cosine transform constants, scaling the discrete cosine transform 
equations in the at least one collection by dividing each of the discrete cosine 
transform constants in the collection by one of the discrete cosine transform constants 
from the at least one collection and representing each of the scaled discrete cosine 
transform constants with estimated scaled discrete cosine transform constants 
approximated by sums of powers-of-2. 

Other embodiments of a method in accordance with the principles of the 
invention may include alternative or optional additional aspects. One such aspect of 
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the present invention is that the method further includes separating an image into at 
least one block and transforming the block into transformed data by performing matrix 
multiplication on the discrete cosine transform equations based upon binary arithmetic 
using the estimated scaled discrete cosine transform constants and performing linear 
5 shifts and additions. 

Another aspect of the present invention is that the scaling the discrete cosine 
transform equations in the at least one collection by dividing each of the discrete 
cosine transform constants in the at least one collection by one of the discrete cosine 
transform constants from the at least one collection saves multiplications. 
1 0 Another aspect of the present invention is that the discrete cosine transform 

constant chosen for scaling the discrete cosine transform equations in the at least one 
collection is selected according to a predetermined cost function. 

Another aspect of the present invention is that the cost function minimizes a 
number of add operations. 
1 5 Another aspect of the present invention is that the cost function minimizes a 

worst case number of add operations. 

Another aspect of the present invention is that the cost function minimizes an 
error per constant resulting from the approximations. 

Another aspect of the present invention is that the transforming the block into 
20 transformed data further comprises generating at least one set of one dimensional 
discrete cosine transform equations. 
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Another aspect of the present invention is that the discrete cosine transform 
constants are obtained by splitting the discrete cosine transform constants into even 
and odd terms by obtaining sums and differences of input samples. 

Another aspect of the present invention is that the block is an N!xN 2 block. 

Another aspect of the present invention is that Ni = N 2 = 8. 

In another embodiment of the present invention, a data compression system 
is provided. The data compression system includes a discrete cosine transformer 
for applying a discrete cosine transform to decorrelate data into discrete cosine 
transform equations, the discrete cosine transform equations having been formed by 
arranging the discrete cosine transform equations into at least one collection having 
at least two discrete cosine transform constants, scaling the discrete cosine 
transform equations in the at least one collection by dividing each of the discrete 
cosine transform constant in the collection by one of the discrete cosine transform 
constants from the at least one collection and representing each of the scaled 
discrete cosine transform constants with estimated scaled discrete cosine transform 
constants approximated by sums of powers-of-2 and a quantizer for quantizing the 
transformed data into quantized data to reduce the number of bits needed to 
represent the transform coefficients. 

In another embodiment of the present invention, a printer is provided. The 
printer includes a memory for storing data, a processor for processing the data to 
provide a compressed print stream output and a printhead driving circuit for 
controlling a printhead to generate a printout of the data, wherein the processor 
applies a discrete cosine transform to decorrelate data into transform coefficients 
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using discrete cosine equations, the discrete cosine transform equations having 
been formed by arranging the discrete cosine transform equations into at least one 
collection having at least two discrete cosine transform constants, scaling the 
discrete cosine transform equations in the at least one collection by dividing each of 
the discrete cosine transform constant in the collection by one of the discrete cosine 
transform constants from the at least one collection and representing each of the 
scaled discrete cosine transform constants with estimated scaled discrete cosine 
transform constants approximated by sums of powers-of-2. 

In another embodiment of the present invention, an article of manufacture is 
provided. The article of manufacture includes a program storage medium readable 
by a computer, the medium tangibly embodying one or more programs of 
instructions executable by the computer to perform a method for performing faster 
discrete cosine transform, the method includes arranging discrete cosine transform 
equations into at least one collection having at least two discrete cosine transform 
constants, scaling the discrete cosine transform equations in the at least one 
collection by dividing each of the discrete cosine transform constant in the collection 
by one of the discrete cosine transform constants from the at least one collection 
and representing each of the scaled discrete cosine transform constants with 
estimated scaled discrete cosine transform constants approximated by sums of 
powers-of-2. 

In another embodiment of the present invention, a data analysis system is 
provided. The data analysis system includes a memory for storing the discrete 
cosine transform equations having been formed by arranging the discrete cosine 
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transform equations into at least one collection having at least two discrete cosine 
transform constants, scaling the discrete cosine transform equations in the at least 
one collection by dividing each of the discrete cosine transform constant in the 
collection by one of the discrete cosine transform constants from the at least one 
collection and representing each of the scaled discrete cosine transform constants 
with estimated scaled discrete cosine transform constants approximated by sums of 
powers-of-2 and a transformer for applying the transform equations to perform a 
discrete cosine transform to decorrelate data into discrete cosine transform 
coefficients. 

These and various other advantages and features of novelty which characterize 
the invention are pointed out with particularity in the claims annexed hereto and form a 
part hereof. However, for a better understanding of the invention, its advantages, and 
the objects obtained by its use, reference should be made to the drawings which form 
a further part hereof, and to accompanying descriptive matter, in which there are 
illustrated and described specific examples of an apparatus in accordance with the 
invention. 
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BRIEF DESCRIPTION OF THE DRAWINGS 
Referring now to the drawings in which like reference numbers represent 
corresponding parts throughout: 

Fig. 1 illustrates a block diagram of a JPEG encoder; 
5 Fig. 2 illustrates a table that shows the terms C2 and C6; 

Fig. 3 is a table that shows the actual values if either C2 or C6 is used to 
scale the coefficient matrix and the estimates are a ratio of integers; 

Fig. 4 shows all possible combinations of dividing the coefficient matrix by C1, 
C3, C5, and C7; 

10 Fig. 5 shows the actual values when C5 is used to scale the coefficient 

matrix; 

Fig. 6 shows the actual values when C7 is used to scale the coefficient 

matrix; 

Fig. 7 shows the number of binary adds to generate the unsealed cosine 
1 5 terms within one percent error; 

Fig. 8 illustrates a flow chart of the method to design the fast DCT using 
scaled terms according to the present invention; 

Fig. 9 illustrates a printer according to the present invention; 

Fig. 10 illustrates a data analyzing system according to the present invention; 

20 and 

Fig. 11 illustrates another data analyzing system according to the present 
invention. 
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DETAILED DESCRIPTION OF THE INVENTION 
In the following description of the exemplary embodiment, reference is made 
to the accompanying drawings which form a part hereof, and in which is shown by 
way of illustration the specific embodiment in which the invention may be practiced. 
It is to be understood that other embodiments may be utilized as structural changes 
may be made without departing from the scope of the present invention. 

The present invention provides a faster discrete cosine transform that uses 
scaled terms. The present invention decreases the number of multiplications by 
using scaled terms for the transform constants. Further, after scaling, the ratios of 
the constants are replaced by a series of linear shifts and additions since the 
constants may be approximated by sums of powers-of-2 after only a few 
summations. 

Fig. 1 illustrates a block diagram of a JPEG encoder 100. In Fig. 1 , digital 
image data 1 10 is divided up into 8 by 8 pixel blocks 112. Then, the discrete cosine 
transform (DCT) of each block is calculated 120. The discrete cosine transform 
(DCT) helps separate the data into parts (or spectral sub-bands) of differing 
importance (e.g., with respect to an image's visual quality). The DCT is similar to the 
discrete Fourier transform: it transforms a signal from the spatial domain to the 
frequency domain. 

A quantizer 130 rounds off the DCT coefficients according to the quantization 
matrix. This step produces the "lossy" nature of JPEG, but allows for large 
compression ratios. There is a tradeoff between image quality and degree of 
quantization. A large quantization step size can produce unacceptably large image 
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distortion. This effect is similar to quantizing Fourier series coefficients too coarsely; 
large distortion would result. Unfortunately, finer quantization leads to lower 
compression ratios. The question is how to quantize the DCT coefficients most 
efficiently. Because of human eyesight's natural high frequency roll-off, these 
frequencies play a less important role than low frequencies. This lets JPEG use a 
much higher step size for the high frequency coefficients, with little noticeable image 
deterioration. The quantizer coefficient output may then be encoded by an optional 
entropy encoder 140 to produce a compressed data stream 150 to an output file. 
Note that for JPEG the entropy encoder is not optional, but other similar data 
compression systems could be designed without the CPU cycles required by the 
entropy encoder. 

However, as described below, there is a need to provide a method and 
apparatus for performing discrete cosine transforms with less multiplication steps to 
increase throughput of the encoder 100. As will be described, the method according 
to the present invention saves multiplications in the brute force equations by scaling 
the coefficient matrix. Each separable subgroup is scaled independently of the other 
collections. Within each collection, the remaining multiplications are replaced by 
simple shifts and adds. The scaling terms are chosen according to various cost 
functions. The preferred embodiment uses cost functions that minimize the number 
of adds and that minimize the worst case number of adds. However, those skilled in 
the art will recognize that alternate cost functions could choose how much error is 
allowed per coefficient. Moreover, those skilled in the art will recognize that the 
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16 

inverse DCT (IDCT) can be implemented using the same method so that the same 
number of operations are used. 

Let the Cn = cosine (n * tt/16). Let f(x) for x = 0 to 7 be the input samples. 
Let F(u) for u=0 to 7 be proportional to the 1-D transform coefficients. Let S(u) for 
u=0 to 7 be equal to the 1-D transform coefficients. Let sxy be f(x) + f(y) and dxy be 
f( x ) - %) for x, y = 0 to 7. Note the calculations for F(u) below ignore some 
constants that end up in the quantization terms. The first step in the 1-D DCT is to 
split the coefficients into the even and odd terms by taking sums and differences of 
the eight input samples. 



s07 = f(0) + f(7) 
s16 = f(1) + f(6) 
s25 = 1(2) + f(5) 
s34 = f(3) + f(4) 

15 d07 = f(0) - f(7) 

d16 = f(1)-f(6) 
d25 = f(2) - f(5) 
d34 = f(3) - f(4) 

20 Then, the sums and differences of the sums are calculated: 

s0734 = s07 + s34 
s1625 = s16 + s25 
d0734 = s07 - s34 
d1625 = s16-s25 

Finally we take sums and differences of the sums again to calculate F(0) and F(4). 



F(0) = s0734 + s1625 
F(4) = s0734-s1625 
30 2 *S(0) = C4(s0734 + s1 625) 

2*S(4) = C4(s0734-s1625) 
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Those skilled in the art will recognize that the equations above have been 
derived similar to the Arai, Agui, and Nakajima algorithm. The F(2) and F(6) can be 
calculated from these terms as: 



F(2) = C2 * d0734 + C6 * dl625 

F(6) = C6 * d0734 - C2 * d1625 
2 *S(2) = C2 * d0734 + C6 * d1625 
2 *S(6) = C6 * d0734 - C2 * d1625 

Fig. 2 illustrates a table 200 that shows the C2 and C6 terms divided by C2 or 
C6. C2 is equal to 0.923880, C2/C2 is equal to 1.000000, C2/C6 is equal to 1+V2 , 

C6 is equal to 0.382683, C6/C2 is equal to — !=, and C6/C6 is equal to 1.000000. 

1 + V2 

Next, the utility of scaling the constants in the matrix by C6 or C2 is investigated. 

Fig. 3 is a table 300 that shows the actual values 31 0 if either C2 or C6 is 
used to scale the constants in the matrix and the estimates 312 are a ratio of 
integers. Fig. 3 shows the error 314 due to each of the approximations. C2 was 
chosen because the greatest error occurs in the harmonic with the least power ( or 
the smallest contribution to the overall output ). 

Since F(2) is much more likely to occur, it is now exactly calculated. So the 
equations become: 



F(2) = 12 * d0734 + 5 * d1625 = d1625 + (d0734 + d1625 + (d0734«1))«2 
F(6) = 5 * d0734 - 12 * d1625 = d0734 + (d0734-d1625 - (d1625«1))«2 
(24/C2) * S(2) = d1625 + (d0734 + d1625 + (d0734«1))«2 
(24/C2) * S(6) = d0734 + (d0734 - d1625 - (d1625«1))«2 



Equation 1 shows the remaining equations for the odd constants expressed in 
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Equation 1 



In Equation 1, F(1), F(3), F(5) and F(7) are a vector related to unquantized 1- 
D DCT transform coefficients. Note: for u = (1, 3, 5, 7), 2 * S(u) = F(u). If the 
solutions for these four coefficients are found using a "brute force" approach, then 
12 additional additions and 16 multiplications would be necessary. However, if the 
matrix of coefficients is divided by one of the constants in the matrix, then the 
number of multiplications necessary is reduced by four. This linear system is shown 
in Equation 2 below. 
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Equation 2 



Equation 2 shows that the linear set now only involves 12 additions, 12 
multiplications and one division. However, the constant C5 in the division can be 
included in the quantization numbers once per image. Meanwhile, the benefits of 
only 12 multiplications instead of 16 are reaped every 8 by 8 block of samples. 
Furthermore, when the constants of the matrix were scaled by one of the terms, all 
of the terms became approximations of sums of powers-of-2 within a few successive 
approximations. This is very beneficial because a multiplication which normally is 
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considered to necessitate more than 5 CPU cycles now becomes a shift and add 
with a maximum of 3 shifts and 2 additions. Fig. 4 shows all possible combinations 
of dividing the constants in the matrix by C1 , C3. C5, and C7. 

Below, the utility of scaling the constants in the matrix by C5 or by C7 is 
investigated. Fig. 5 is a table 500 showing the actual values 510 if C5 is used to 
scale the constants of the matrix. The estimates 512 are in sums of powers-of-2, 
and the error 514 due to the approximation is also stated. C5 was chosen because 
of the continuity in the terms. It was also chosen because the greatest error occurs 
in the harmonic with the least power ( or the smallest contribution to the overall 
output ). The constant C7 is also a good candidate because the calculations fit 
inside of 16 bits. This optimization is shown in table 600 of Fig. 6. 

Those skilled in the art will recognize the above scaling is not meant to be an 
optimal solution for minimizing error. Nevertheless, an exhaustive search may be 
performed to find the optimal number to scale the matrix. Also, the scaling factor 
need not be a function of cos(n * tt/16). As shown for the C2 and C6 cases, the 
best answer was a ratio of integers. For comparison purposes, Fig. 7 is a table 700 
showing the number of binary adds and used to calculate the unsealed cosine terms 
within one percent error. 

In summary, the scaled terms approach fundamentally decreases the number 
of multiplications because the scaling on the quantization values is done once per 
image. However, the benefits are reaped every 8x8 block, or more generally, NixN 2 
block. Second, after scaling, the ratios of coefficients are very nearly sums of 
powers-of-2 after only a few summations. Thus, multiplications can be replaced by a 
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series of linear shifts and additions. In some cases, these terms may be further 
converted into integer terms. This will provide superior hardware and very fast 
software on machines where multiplications are costly (in terms of CPU cycles). 

Fig. 8 illustrates a flow chart 800 of the method used to design the fast DCT 
method using scaled terms according to the present invention. In Fig. 8, the one 
dimensional transform coefficients equations for the DCTs are obtained 810. To 
obtain the coefficients, the coefficients are split into even and odd terms by obtaining 
sums and differences with the eight input samples, which are then used to calculate 
the one dimensional transform coefficients. 

After the one dimensional coefficients are obtained, the constants in the 1-D 
transform matrix are scaled by dividing by one of the constants 820. This reduces 
the number of multiplications, for an 8x8 block, by four. After scaling, the ratios of 
constants are very nearly sums of powers-of-2 after only a few summations Thus, 
multiplications can be replaced by a series of linear shifts and additions 830. 

Fig. 9 illustrates a block diagram 900 of a printer 920 according to the present 
invention. In Fig. 9, the printer 920 receives data 912 from a host processor 910. 
The data 912 is provided into memory 930 where the data 912 may be arranged into 
blocks. The blocks are then processed by a processor 940, such as a raster image 
processor. The processor 940 provides a compressed print stream representing the 
data to a printhead driving circuit 950. The printhead driving circuit 950 then controls 
the printhead 960 to generate a printout 970 of the data. 

The algorithm designed with reference to Fig. 8 may be tangibly embodied in 
a computer-readable medium or carrier 990, e.g. one or more of the fixed and/or 
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removable data storage devices illustrated in Fig. 9, or other data storage or data 
communications devices. The computer program may be loaded into the memory 
942 to configure the processor 940 of Fig. 9, for execution. The computer program 
comprise instructions which, when read and executed by the processor 940 of Fig. 
9, causes the processor 940 to perform the steps necessary to execute the steps or 
elements of the present invention. 

Fig. 10 illustrates a data analyzing system 1000 according to the present 
invention. In Fig. 10, a discrete cosine transform 1010 receives a block of data 1012 
to be analyzed. The discrete cosine transform 1010 uses discrete cosine transform 
equations 1020 to generate transformed data 1024. Prior to execution, the discrete 
cosine transform equations 1020 are split into at least one sub-transform having at 
least two transform constants. The at least two transform constants for each 
collection are independently scaled with a scaling term to maintain a substantially 
uniform ratio between the at least two transform constants within the at least one 
collection, wherein the scaling term is chosen according to a predetermined cost 
function. The discrete cosine transformed data 1024 may then be quantized by an 
optional quantizer 1030. 

Fig. 1 1 illustrates another data analyzing system 1 100 according to the 
present invention. In Fig. 1 1 , a discrete cosine transform 1110 receives a block of 
data 1 1 12 to be analyzed. The discrete cosine transform 1110 uses discrete cosine 
transform equations 1 120 to generate transformed data 1 124. Prior to execution, 
the discrete cosine transform equations 1 120 are split into at least one sub- 
transform having at least two transform constants. The at least two transform 
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constants for each collection are independently scaled with a scaling term to 
maintain a substantially uniform ratio between the at least two transform constants 
within the at least one collection, wherein the scaling term may be chosen according 
to a predetermined cost function. The discrete cosine transformed data 1 124 may 
5 then be compared to scaled comparison values in an optional comparator 1 130. 

The foregoing description of the exemplary embodiment of the invention has 
been presented for the purposes of illustration and description. It is not intended to 
be exhaustive or to limit the invention to the precise form disclosed. Many 
modifications and variations are possible in light of the above teaching. It is 
1 0 intended that the scope of the invention be limited not with this detailed description, 
but rather by the claims appended hereto. 
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WHAT IS CLAIMED IS: 

1 1 • A method for generating faster discrete cosine transforms, 

2 comprising: 

3 arranging discrete cosine transform equations into at least one collection 

4 having at least two discrete cosine transform constants; 

5 scaling the discrete cosine transform equations in the at least one collection 

6 by dividing each of the discrete cosine transform constants in the collection by one 

7 of the discrete cosine transform constants from the at least one collection; and 

8 representing each of the scaled discrete cosine transform constants with 

9 estimated scaled discrete cosine transform constants approximated by sums of 
10 powers-of-2. 

1 2. The method of claim 1 further comprising separating an image into at 

2 least one block and transforming the block into transformed data by performing 

3 matrix multiplication on the discrete cosine transform equations based upon binary 

4 arithmetic using the estimated scaled discrete cosine transform constants and 

5 performing linear shifts and additions. 

1 3. The method of claim 1 wherein the scaling the discrete cosine 

2 transform equations in the at least one collection by dividing each of the discrete 

3 cosine transform constants in the at least one collection by one of the discrete 

4 cosine transform constants from the at least one collection saves multiplications. 
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4. The method of claim 1 wherein the discrete cosine transform constant 
chosen for scaling the discrete cosine transform equations in the at least one 
collection is selected according to a predetermined cost function. 

5. The method of claim 4 wherein the cost function minimizes a number 
of add operations. 

6. The method of claim 4 wherein the cost function minimizes a worst 
case number of add operations. 

7. The method of claim 4 wherein the cost function minimizes an error per 
constant resulting from the approximations. 

8. The method of claim 2 wherein the transforming the block into 
transformed data further comprises using at least one set of one dimensional 
discrete cosine transform equations. 

9. The method of claim 8 wherein the discrete cosine transform constants 
are obtained by splitting the discrete cosine transform constants into even and odd 
terms by obtaining sums and differences of input samples. 

10. The method of claim 2 wherein the block is an NixN 2 block. 

1 1 . The method of claim 1 0 wherein Ni = N 2 = 8. 
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1 12. A data compression system, the data compression system comprising 

2 a discrete cosine transformer for applying a discrete cosine transform to decorrelate 

3 data into discrete cosine transform equations, the discrete cosine transform 

4 equations having been formed by arranging the discrete cosine transform equations 

5 into at least one collection having at least two discrete cosine transform constants, 

6 scaling the discrete cosine transform equations in the at least one collection by 

7 dividing each of the discrete cosine transform constant in the collection by one of the 

8 discrete cosine transform constants from the at least one collection and representing 

9 each of the scaled discrete cosine transform constants with estimated scaled 
1 0 discrete cosine transform constants approximated by sums of powers-of-2. 

1 1 3. The data compression system of claim 1 2 further comprising a 

2 quantizer for quantizing the transformed data into quantized data to reduce the 

3 number of bits needed to represent the transform coefficients. 

1 14. The data compression system of claim 12 wherein the discrete cosine 

2 transformer further separates an image into at least one block and transforms the 

3 block into transformed data using the discrete cosine transform equations based 

4 upon binary arithmetic using the estimated scaled discrete cosine transform 

5 constants and performing linear shifts and additions. 
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1 1 5. The data compression system of claim 1 2 wherein the transformer 

2 executes equations that save multiplication operations, the equations having been 

3 formed by scaling the discrete cosine transform equations in the at least one 

4 collection by dividing each of the discrete cosine transform constants in the at least 

5 one collection by one of the discrete cosine transform constants from the at least 

6 one collection. 



1 16. The data compression system of claim 1 5 further comprising an 

2 entropy encoder for further compressing the quantized coefficients losslessly. 

1 1 7. The data compression system of claim 12 wherein the discrete cosine 

2 transform constant used for scaling the discrete cosine transform equations in the at 

3 least one collection is selected according to a predetermined cost function. 

1 1 8. The data compression system of claim 1 7 wherein the cost function 

2 minimizes a number of add operations. 

1 19. The data compression system of claim 1 7 wherein the cost function 

2 minimizes a worst case number of add operations. 

1 20. The data compression system of claim 1 7 wherein the cost function 

2 minimizes an error per constant resulting from the approximations. 

1 21 . The data compression system of claim 12 wherein discrete cosine 



2 transformer uses at least one set of one dimensional discrete cosine transform 



3 equations. 
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1 22. The data compression system of claim 22 wherein the equations split 

2 the discrete cosine transform coefficients into even and odd terms by obtaining sums 

3 and differences of input samples. 

1 23. The data compression system of claim 14 wherein the block is an 

2 NixN 2 block. 

1 24. The data compression system of claim 23 wherein Ni = N 2 = 8. 
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1 25. A printer, comprising: 

2 a memory for storing data; 

3 a processor for processing the data to provide a compressed print stream 

4 output; and 

5 a printhead driving circuit for controlling a printhead to generate a printout of 

6 the data; 

7 wherein the processor applies a discrete cosine transform to decorrelate data 



8 into transform coefficients using discrete cosine equations, the discrete cosine 

9 transform equations having been formed by arranging the discrete cosine transform 

1 0 equations into at least one collection having at least two discrete cosine transform 

1 1 constants, scaling the discrete cosine transform equations in the at least one 

1 2 collection by dividing each of the discrete cosine transform constant in the collection 

1 3 by one of the discrete cosine transform constants from the at least one collection 

14 and representing each of the scaled discrete cosine transform constants with 

15 estimated scaled discrete cosine transform constants approximated by sums of 

16 powers-of-2. 

1 26. The printer of claim 25 wherein the processor further separates an 

2 image into at least one block and transforms the block into transformed data by 

3 performing matrix multiplication on the discrete cosine transform equations based 

4 upon binary arithmetic using the estimated scaled discrete cosine transform 

5 constants and performing linear shifts and additions. 
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1 27. The printer of claim 25 wherein the processor executes equations that 

2 save multiplication operations, the equations having been formed by scaling the 

3 discrete cosine transform equations in a collection by dividing each of the discrete 

4 cosine transform constants in the at least one collection by one of the discrete 

5 cosine transform constants from the at least one collection. 



1 28. The printer of claim 25 wherein the discrete cosine transform constant 

2 used in scaling the discrete cosine transform equations in the at least one collection 

3 is selected according to a predetermined cost function. 

1 29. The printer of claim 28 wherein the cost function minimizes a number 

2 of add operations. 

1 30. The printer of claim 28 wherein the cost function minimizes a worst 

2 case number of add operations. 

1 31 . The printer of claim 28 wherein the cost function minimizes an error 

2 per constant resulting from the approximations. 

1 32. The printer of claim 25 wherein processor uses at least one set of one 

2 dimensional discrete cosine transform equations. 

1 33. The printer of claim 32 wherein the processor splits the discrete cosine 

2 transform coefficients into even and odd terms by obtaining sums and differences of 

3 input samples. 
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1 34. The printer of claim 26 wherein the block is an NixN 2 block. 

1 35. The printer of claim 34 wherein ISh = N 2 = 8. 

1 36. An article of manufacture comprising a program storage medium 

2 readable by a computer, the medium tangibly embodying one or more programs of 

3 instructions executable by the computer to use equations created by a method for 

4 generating faster discrete cosine transforms, the method comprising: 

5 arranging discrete cosine transform equations into at least one collection 

6 having at least two discrete cosine transform constants; 

7 scaling the discrete cosine transform equations in the at least one collection 

8 by dividing each of the discrete cosine transform constant in the collection by one of 

9 the discrete cosine transform constants from the at least one collection; and 

1 0 representing each of the scaled discrete cosine transform constants with 

1 1 estimated scaled discrete cosine transform constants approximated by sums of 

12 powers-of-2. 

1 37. The article of manufacture of claim 36 further comprising separating an 

2 image into at least one block and transforming the block into transformed data by 

3 using discrete cosine transform equations based upon binary arithmetic using the 

4 estimated scaled discrete cosine transform constants and performing linear shifts 

5 and additions. 
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1 38. The article of manufacture of claim 36 wherein the scaling the discrete 

2 cosine transform equations in the at least one collection by dividing each of the 

3 discrete cosine transform constants in the at least one collection by one of the 

4 discrete cosine transform constants from the at least one collection saves 

5 multiplications. 

1 39. The article of manufacture of claim 36 wherein the discrete cosine 

2 transform constant chosen for scaling the discrete cosine transform equations in the 

3 at least one collection is selected according to a predetermined cost function. 

1 40. The article of manufacture of claim 39 wherein the cost function 

2 minimizes a number of add operations. 

1 41 . The article of manufacture of claim 39 wherein the cost function 

2 minimizes a worst case number of add operations. 

1 42. The article of manufacture of claim 39 wherein the cost function 

2 minimizes an error per constant resulting from the approximations. 

1 43. The article of manufacture of claim 36 wherein the transforming the 

2 block into transformed data further comprises using at least one set of one 

3 dimensional discrete cosine transform equations. 

1 44. The article of manufacture of claim 43 wherein the discrete cosine 



2 transform constants are obtained by splitting the discrete cosine transform constants 



3 into even and odd terms by obtaining sums and differences of input samples. 
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1 45. The article of manufacture of claim 37 wherein the block is an NixN 2 

2 block. 

1 46. The article of manufacture of claim 45 wherein Ni = N 2 = 8. 

1 47. A data analysis system, comprising; 

2 a memory for storing discrete cosine transform equations having been are 



3 formed by arranging the discrete cosine transform equations into at least one 

4 collection having at least two discrete cosine transform constants, scaling the 

5 discrete cosine transform equations in the at least one collection by dividing each of 

6 the discrete cosine transform constant in the collection by one of the discrete cosine 

7 transform constants from the at least one collection and representing each of the 

8 scaled discrete cosine transform constants with estimated scaled discrete cosine 

9 transform constants approximated by sums of powers-of-2; and 

1 0 a transformer for applying the transform equations to perform a discrete 

1 1 cosine transform to decorrelate data into discrete cosine transform coefficients. 

1 48. The data analysis system of claim 47 wherein the transformer further 

2 separates an image into at least one block and transforms the block into transformed 

3 data by using the discrete cosine transform equations based upon binary arithmetic 

4 using the estimated scaled discrete cosine transform constants and performing 

5 linear shifts and additions. 
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1 49. The data analysis system of claim 47 wherein the discrete cosine 

2 transform constant used for scaling the discrete cosine transform equations in the at 

3 least one collection is selected according to a predetermined cost function. 
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ABSTRACT 

Faster discrete cosine transforms that use scaled terms are disclosed. Prior 
to application of a transform, equations are arranged into collections. Each 
collection is scaled by dividing each of the discrete cosine transform constants in the 
5 collection by one of the discrete cosine transform constants from the collection. 
Each of the scaled discrete cosine transform constants are then represented with 
approximated sums of powers-of-2. During the execution phase the block of input 
data is obtained. A series of predetermined sums and shifts is performed on the 
data. The output results are saved. 
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City of Residence 

Lonpmont 


State or Country Of Residents 

Colorado 


Country of Citizenship 
USA 


Post Office Address 


Street Address 

1100 E. 17 th Ave. SU201 


City 

Longmont 


State & Zip Code or Country 
Colorado 80501 






• V 


Full Name of Second Inventor, If any 


Family Name 

Trelewricz 


First Given Nam* 

Jennifer 


Second Given Name 

Quirin 


Residence and Citizenship 


City of Residence 

Superior 


State or Country of Residence 

Colorado 


Country of Citteensnip 
USA 


Post Office Address 


Street Address 
SJ285 E. ImperHTfcsr: — 


City 

Supwtet 


State & Zip Code or Country 
Colorado 80027 




Date n 

20 OcJ^l^TVt) 
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Full Name of Third Inventor, If anv 


Family Name 
Mitchell 


First Givan Nemo 

Joan 


Second Given Name 

LaVerne 


Residence and Citizenship 


City of Residence 
Longmont 


Slate or Country of Residence 

Colorado 


Country of Citteonsnip 
USA 


Post Office Address 


Street Address 

2400- 17th Avenue, Unit 
103D 


City 

Longmont 


Stat* & Zip Code or Country 

Colorado 80503 








Full Name of Fourth Inventor, If any 1 


Family blame 


First Given Name 


Second Given Name 


Residence gnd Citizenship 


City of Residence 


State a* Cotmfly of Residence 


Country of Citizenship 


Post Office Address 


Street Address 


City 


State & Zip Code or Country 


Signature of Inventor 


Date 



Full Name of Fifth Inventor, if anv 


Family Name 


First Given Name 


Second Given Name 


Residence and Citizenship 


City of Residence 


State or Country of Reddened 


Country of Citiatensnip 


Post Office Address 


Street Address 


City 


State & Zip Code or Country 


Signature of Inventor 


Date 



Full Name of Sixth Inventor, if any 


Family Name 


First Given Nam* 


Second Gtven Name 


Residence and Citlsanship 


City of Residence 


Stite or Country of Residence 


Country of Citizenship 


Post Office Address 


Street Address 


City 


State &Zip Code or Country 


Signature of Inventor 


Date 



OCT 22 2000 12^4g 
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